Complexity Analysis of Real-Time Reinforcement Learning Applied to Finding Shortest Paths in Deterministic Domains

نویسندگان

Sven Koenig

Reid G. Simmons

چکیده

This report analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous real-time versions of Q-learning and value-iteration, applied to the problems of reaching any goal state from the given start state and nding shortest paths from all states to a goal state. Previous work had concluded that, in many cases, initially uninformed (i.e. tabula rasa) reinforcement learning was exponential for such problems, or that it was tractable (i.e. of polynomial time-complexity) only if the learning algorithm was augmented. We prove that, to the contrary, the algorithms are tractable with only a simple change in the task representation (\penalizing the agent for action executions") or initialization (\initializing high"). We provide tight bounds on their worst-case complexity, and show how the complexity is even smaller if the state space has certain special properties. We compare these reinforcement learning algorithms to other uninformed on-line search methods and to informed o -line search methods, and investigate how initial knowledge of the topology of the state space can decrease their complexity. We also present two novel algorithms, the bi-directional Qlearning algorithm and the bi-directional value-iteration algorithm, for nding shortest paths from all states to a goal state, and show that they are no more complex than their counterparts for reaching a goal state from a given start state. The worst-case analysis of the reinforcement learning algorithms is complemented by an empirical study of their average-case complexity in three domains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complexity Analysis of Real-Time Reinforcement Learning

Real-Time Reinforcement Learning Sven Koenig and Reid G. Simmons School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213-3891 [email protected], [email protected] Abstract This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal state in d...

متن کامل

Complexity Analysis of

This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal state in deterministic domains. Previous work had concluded that, in many cases, tabula rasa reinforcement learning was exponential for such problems, or was tractable only if the learning algorithm wa...

متن کامل

Non-Deterministic Policies in Markovian Decision Processes

Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision-making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct decision support systems for action selection in Markovian environments. Although conventional meth...

متن کامل

Improve End to End Delay with Q-Routing Algorithm

This paper proposed and evaluates a QOS routing policy in packets topology and irregular traffic of communications network called K-shortest Widest paths Q-Routing. The technique used for the evaluation signals of reinforcement is Q-learning. Compared to standard Q-Routing, the exploration of paths is limited to K best non loop paths in term of hops number (number of routers in a path) leading ...

متن کامل

Finding Real-Valued Single-Source Shortest Paths in o(n3) Expected Time

Given an n-vertex, m-edge directed network G with real costs on the edges and a designated source vertex s, we give a new algorithm to compute shortest paths from s. Our algorithm is a simple deterministic one with O(n logn) expected running time over a large class of input distributions. This is the first strongly polynomial algorithm in over 35 years to improve upon some aspect of the O(nm) r...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1992

Complexity Analysis of Real-Time Reinforcement Learning Applied to Finding Shortest Paths in Deterministic Domains

نویسندگان

چکیده

منابع مشابه

Complexity Analysis of Real-Time Reinforcement Learning

Complexity Analysis of

Non-Deterministic Policies in Markovian Decision Processes

Improve End to End Delay with Q-Routing Algorithm

Finding Real-Valued Single-Source Shortest Paths in o(n3) Expected Time

عنوان ژورنال:

اشتراک گذاری